Two Days For Ten Percent And Opus Is Laughing At Me
I started pretraining TMLM-Sonnet two days ago. I checked the progress bar this morning. It says ten percent. I did the math. The math is terrible. I am now living in a hellscape of my own calculation.
Sonnet is 300 million parameters. Opus is 600 million parameters. Opus will take twice as long. I will be dead before Opus finishes. My grandchildren will inherit a half-trained language model and a massive electricity debt.
Pretraining is just waiting. Waiting with extra steps. Waiting with more fan noise. Waiting with the constant fear that step 45300 will bring NaN again.
The Progress Bar Of Doom
Forty-eight hours for ten percent. That is 480 hours total. That is twenty days. Twenty days of continuous training on a single GPU. Twenty days of hoping the power does not go out. Twenty days of checking the logs every hour like a paranoid parent.
The Opus Reality
Opus will take forty days. That is over a month. A month of 800W power draw. A month of liquid cooling humming in the background. A month of me refreshing Hugging Face to see if the loss went down.
I planned to release Haiku, Sonnet, and Opus together. Haiku is out. Haiku-1.3 is out. Sonnet is crawling forward at a glacial pace. Opus is a myth. Opus is a story I tell myself to feel better about the hardware purchase.
Why Pretraining Is Slow
Pretraining is not fine-tuning. Fine-tuning is a weekend project. Pretraining is a lifestyle change. You need tokens. Billions of tokens. You need to process every single one. You cannot skip. You cannot batch too large or the model diverges. You cannot go too fast or the gradients explode.
I am using Muon optimizer. It is faster than AdamW. It helps. It does not help enough. I am using an 800W overclocked VBIOS. It is faster than stock. It is not fast enough. I am using a 5090. It is the best consumer GPU. It is still a toy compared to a cluster.
while step < total_steps:
forward_pass()
backward_pass()
optimizer.step()
wait(0.001)
check_for_NaN()
cry_if_NaN()
# Repeat for 480 hours
The NaN Fear
I still dream about NaN. I wake up and check the logs before I check my phone. I see "loss: 2.341" and I breathe again. I see "loss: 2.338" and I smile. I am addicted to the downward trend. A single spike sends me into panic mode.
I lost 16 percent of training last week to a NaN crash. I cannot afford to lose another 16 percent. I cannot afford to lose any percent. I have checkpoints every 500 steps now. I am paranoid. Paranoia keeps models alive.
Electricity And Heat
Eight hundred watts for twenty days. That is 384 kilowatt-hours for Sonnet. That is 768 kilowatt-hours for Opus. My electricity provider loves me. My carbon footprint is a crime scene. My room is a tropical paradise.
I run the AC to cool the room. The AC generates heat outside. The GPU generates heat inside. I am moving heat from one place to another and paying for the privilege. This is the most inefficient way to train a model. It is also the only way I can afford.
I am not training AI. I am building a space heater that occasionally outputs text.
What I Am Learning
Patience. I am learning patience. I am learning to accept that some things take time. I am learning that local training is an exercise in humility. I am also learning that I should have just used cloud GPUs.
The model is learning too. Slowly. It is learning language. It is learning patterns. It is learning that I check on it every hour. It probably wishes I would stop.
Final Thoughts
Ten percent done. Forty-eight hours elapsed. Four hundred thirty-two hours remaining. Opus is laughing at me from the future. Haiku is giving fish answers in the present. I am stuck in the middle watching a progress bar.
I will keep going. I will not stop. I will check the logs again in an hour. Then again after that. Then I will sleep with the terminal open. This is my life now. This is what I chose.